Fast iscovery of Sim

نویسنده

  • Pedro Domingos
چکیده

The recent emergence of data mining as a major application of machine learning has led to increased interest in fast rule induction algorithms. These are able to efficiently process large numbers of examples, under the constraint of still achieving good accuracy. If e is the number of examples, many rule learners have O(e4) asymptotic time complexity in noisy domains, and C4.5RULES has been empirically observed to sometimes require O(e3) time. Recent advances have brought this bound down to O(elog’e), while maintaining accuracy at the level of C4.5RULES’s (Cohen 1995). Ideally, we would like to have an algorithm capable of inducing accurate rules in time linear in e, without becoming too expensive in other factors. This extended abstract presents such an algorithm. Most rule induction algorithms employ a “separate and conquer” method, inducing each rule to its full length before going on to the next one. They also evaluate each rule by itself, without regard to the effect of other rules. This is a potentially inefficient approach: rules may be grown further than they need to be, only to be pruned back afterwards, when the whole rule set has already been induced. An alternative is to interleave the construction of all rules, evaluating each rule in the context of the current rule set. This can be termed a “conquering without separating” approach, by contrast with the earlier method, and has been implemented in the CWS algorithm. CWS is outlined in pseudo-code in Table 1. All examples are initially assigned to the majority class. Each rule in CWS is associated with a vector of class probabilities computed from the examples it covers, and predicts the most probable class. Conflicts are resolved by summing the probabilities for all rules covering the test instance, and choosing the class with the highest sum. Acc(RS) is the accuracy of the rule set RS on the training set. This procedure would not be efficient if implemented directly, but, by avoiding the extensive redundancy present in the repeated computation of accuracies and class probabilities, the worstcase time complexity of CWS can be made linear in e and all other relevant parameters. CWS has been extensively evaluated using benchmark problems, a large artificial dataset, and a detailed

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scanning impedance microscopy (SIM): A novel approach for AC transport imaging

Scanning Impedance Microscopy (SIM) is one of the novel scanning probe microscopy (SPM) techniques, which has been developed to taking image from sample surface, providing quantitative information with high lateral resolution on the interface capacitance, and investigating the local capacitance–voltage (C–V) behavior of the interface and AC transport properties. The SIM is an ordinary AFM equip...

متن کامل

High Level Synthesis from Sim-nML Processor Models

The design of modern complex embedded systems require a high level of abstraction of the design. The SimnML[1] is a specification language to model processors for such designs. Several software generation tools have been developed that take ISA specifications in Sim-nML as input. In this paper we present a tool Sim-HS that implements high level behavioral and structural synthesis of processors ...

متن کامل

Scanning impedance microscopy (SIM): A novel approach for AC transport imaging

Scanning Impedance Microscopy (SIM) is one of the novel scanning probe microscopy (SPM) techniques, which has been developed to taking image from sample surface, providing quantitative information with high lateral resolution on the interface capacitance, and investigating the local capacitance–voltage (C–V) behavior of the interface and AC transport properties. The SIM is an ordinary AFM equip...

متن کامل

Super-resolution Imaging of the Cytokinetic Z Ring in Live Bacteria Using Fast 3D-Structured Illumination Microscopy (f3D-SIM)

Imaging of biological samples using fluorescence microscopy has advanced substantially with new technologies to overcome the resolution barrier of the diffraction of light allowing super-resolution of live samples. There are currently three main types of super-resolution techniques - stimulated emission depletion (STED), single-molecule localization microscopy (including techniques such as PALM...

متن کامل

Linear Functions Preserving Multivariate and Directional Majorization

Let V and W be two real vector spaces and let &sim be a relation on both V and W. A linear function T : V → W is said to be a linear preserver (respectively strong linear preserver) of &sim if Tx &sim Ty whenever x &sim y (respectively Tx &sim Ty if and only if x &sim y). In this paper we characterize all linear functions T : M_{n,m} → M_{n,k} which preserve or strongly preserve multivariate an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999